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(N| ■ Abstract 

We use the Allen Gene Expression Atlas (AGEA) and the OMA ortholog dataset to investigate the 
evolution of mouse-brain neuroanatomy from the standpoint of the molecular evolution of brain-specific 
genes. For each such gene, using the phylogenetic tree for all fully sequenced species and the presence 
of orthologs of the gene in these species, we construct and assign a discrete measure of evolutionary 
age. The gene expression profile of all gene of similar age, relative to the average gene expression profile, 
distinguish regions of the brain that are over-represented in the corresponding evolutionary timescale. We 
argue that the conclusions one can draw on evolution of twelve major brain regions from such a molecular 
level analysis supplements existing knowledge of mouse brain evolution and introduces new quantitative 
tools, especially for comparative studies, when AGEA-likc data sets for other species become available. 
,-0 ' Using the functional role of the genes representational of a certain evolutionary timescale and brain region 

^h. we compare and contrast, wherever possible, our observations with existing knowledge in evolutionary 

neuroanatomy. 

^ ■ Introduction 

• Investigations in brain evolution have traditionally used methods ranging from comparative neuroanatomy 

and paleontology, and an extensive literature exists on the subject [TJd]. We may be at the crossroads 
of a new era where the availability of gene expression data across the whole brain for multiple species 
will provide new quantitative tools in studying brain evolution. We propose a method whereby the 
£SJ ■ molecular evolution of the mouse brain-specific genes, couple with the high resolution map of their 

brain- wide expression in the Allen Gene Expression Atlas (AGEA), can be used to draw conclusions on 
the evolution of the brain. We create maps for subsets of AGEA genes of mouse brain grouped by their 
approximate "evolutionary age" which is obtained by the analysis of orthologs of genes across all currently 
fully-sequenced species. These maps are a new resource in studying the evolution of major mouse brain 
, regions, deduced entirely from molecular evolution of genes. 

The AGEA j3U3] maps the expression level of more than 20,000 genes in the mouse brain determined at 
high-resolution (200 /im voxel size) using in situ hybridization of mRNA in a high-throughput manner. 
Such a high resolution dataset is unprecedented in any species, allowing us to formulate and address 
new questions about brain evolution using gene expression. Past work by other groups have employed 
various gene-expression datasets from multiple species for comparative studies of evolution in primates 
(including humans), where transcriptome of organs as a whole (including brain) was considered [5H8]. 
Gene expression studies on the comparative neuroanatomy of avian and mammalian brains have generated 
a new set of hypothesis on brain-region homologies, for a review see Ref. [5]. For avian brain, genc- 
cxprcssion studies have focused on learned vocalization [10] . Currently, AGEA-likc high-resolution genc- 
cxprcssion datasets is unavailable in any species other than mouse, however, creating a framework for 
systematic comparative study of brain evolution across species is of great interest. The methods we report 
here can be easily extended when such datasets do become available. In the current work, we restrict our 
attention to studying the evolutionary age of mouse brain-specific genes and their localization in brain 



, — \ 
>< 



2 



regions to investigate the evolution of those brain regions. Such an investigation supplements traditional 
approaches in brain evolution. The question of whether signatures of molecular evolution of genes are 
informative in the evolution of brain regions is an important one especially because brain regions have 
been traditionally defined by considerations of function and morphology. 

Gene-expression dataset have been used to study comparative brain evolution of brain-regions in hu- 
mans. Ruppin group [11] focused on the evolutionary rates of genes expressed in twenty one different 
human brain regions using human and mouse brain tissue transcriptome [ X 2 [1 1 3 j . Amongst other observa- 
tions, the study highlighted the low evolutionary rate of genes over-expressed in the cortical brain region 
which are more recent in evolutionary time in comparison to the noncortical brain regions. They also 
observed that brain-specific genes have much lower median evolutionary rate compared to the rest and 
genes that arc more brain region-specific in their expression enjoy higher evolutionary rates. In contrast 
to their approach, wc focus on evolutionary ages of mouse genes and not their rates. 

A closer comparison to the spirit of our method is the work by Grant group |14U15j . which focused 
on the evolution of synapses used 19 species (8 mammals, 6 additional chordates and 5 other eukaryotes) 
and orthologs of a total of 651 mouse genes (corresponding to postsynaptic proteins) were identified 
by data-mining Ensembl Compara [16] and protein-BLAST [17]. The work established that the core 
components of the synapse originated in unicellular eukaryotes, where the genes were involved in inter- 
cellular signaling and response to environmental stress. The post-synaptic genes most recent in origin 
are typically enriched in upstream signaling and structural components, and also contributed most to 
the variations of gene expression profile across the mouse brain, where the expression diversity was 
established by a hybrid of protein and mRNA data |18j . Immunohistochcmistry of mouse brain sections 
using antibodies to 43 different synaptic proteins was employed and mRNA were examined using in situ 
hybridization maps [19) . We use a similar ortholog search to ascertain the evolutionary ages of genes, 
but we consider a large subset of brain-specific genes. 

In contrast to previous studies, we present brain-wide maps of 'evolutionary ages' using more than 
three thousand genes and an up-to-date ortholog dataset for all 108 Eukaryotes sequenced. We harness the 
full potential of the AGEA for the purpose, and perform bootstrap analysis to ensure that the patterns of 
'age'-grouped gene expression we report are statistically significant. The Allen Reference Atlas [20] is the 
common three-dimensional atlas to which all the gene-expression data were registered, allowing us to study 
the signatures of gene evolution on major neuroanatomical regions with an unprecedented resolution. We 
also harness the statistical tools developed in determining the localization of gene-expression in AGEA — 
thereby reporting the most significant genes responsible in localized expression classified by a certain 
evolutionary timescale. 

Results 

Our results are based on the analysis of orthologs across 108 Eukaryotic species and more than three 
thousand brain-specific mouse genes for which brain-wide gene-expression data was generated by the 
Allen Institute [3]. We computed correlation between the sagittal and coronal gene-expression datasets 
for 4104 genes and retained 3,041 genes that had the highest correlation across the two datasets. We use 
the OMA dataset to determine the orthologs of these genes for all the 108 species. The phylogcnctic tree 
of these species are known, see for example Refs. J21][22]. We reason that the evolutionary age of a gene 
can be robustly mapped to a discrete score determined by considering the clade on the phylogcnctic tree 
where almost all the orthologs for the gene appear. For example, if a mouse gene has orthologs only in 
vertebrates and not in any non-vertebrate chordate species, then the gene is a 'chordate gene', i.e. it is 
at least as old as chordates. This discrete score, as opposed to a direct scqucncc-bascd-cvolutionary time 
of divergence, is far less sensitive to uncertainties and noise (see Methods). One of the main challenges of 
any such study is the limited number of species fully sequenced, and the under-representation of certain 
clades in these species. For instance, there is only one amphibian {Xenopus tropicalis) in the list. Wc 
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Figure 1. Maximal-intensity projection of the average of the set of 3041 genes in the Allen Gene 
Expression Atlas that have highest correlation between sagittal and coronal data. 

strike a balance between probing clades of interest where major events in brain evolution is known to 
occur, and how well represented the clade is in order to draw statistically meaningful conclusions. Gene 
loss events can lead to correlated disappearance of orthologs in a subset of the clade leading to erroneous 
conclusion about the first appearance of the the gene on the tree. This problem is somewhat alleviated 
by including a fair number of genes in any age group (see Methods). 

We chose the following ordering of clades with respect to progressive refinement into subsets leading 
up to Rodentia. The last common ancestor the species within a clade appearing earlier in this ordering 
is older compared than the last common ancestor of a clade that appears later. 

All D Eukaryota D Metazoa D Coelomata D Chordata D Vertebrata D Sarcopterygii D Mam- 
malia D Eutheria D Euarchontoglires D Rodentia D Mouse 

The result of the analysis of orthologs, as elaborated in Methods, is the assignment of a parameter 
T G [1, . . . , 12] to each gene g, corresponding to when g first 'appeared' in the above discrete ordering 
of clades. For an 'old' gene T is small, T is large for a 'new' gene. We denote by Gt the set of genes 
appearing at T and the set of all genes by G. 

In previous work involving two of the authors [23], we have discussed the gene expression energies 
E(v, g) representing the AGEA data, where v is the voxel index and g is the gene index and v is the 
voxel index. Briefly, (a) in situ hybridization directly measures mRNA levels corresponding a particular 
gene g in imaged section of the brain, denoted by gray-scale intensity I(p, g) for pixel p, (b) a binary 
mask M(p,g) is constructed to select for cell-shaped objects, (c) the following weighted sum of all pixels 
p intersection a voxel v defines E(v,g) — 



where N p is the number of pixels contributing to a voxel. 

The AGEA also provides parcellations of the brain at various degrees of coarseness. One of them, 
which will be referred to as 'Bigl2' defines 12 different regions of the left hemisphere and are; Cerebral 
Cortex, Olfactory areas, Hippocampal region, Striatum, Pallidum, Thalamus, Hypothalamus, Midbrain, 
Pons, Medulla and Cerebellum (see Supplementary Materials). In a more detailed annotation, each of 
these regions are divided into sub-regions, comprising a total of 94 brain regions in the left hemisphere 
we refer to as 'Fine'. For display purposes we show Maximum Intensity Projections of E(v,g) — for all 
genes such projections for coronal, sagittal and axial plane are shown in Fig. [T] 

For genes in the dated set Gt, we create age-selected brain- wide gene expression profile Et{v) defined 
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Genes novel in clade 


Regions significantly over-expressed (significant Lt) 


Eukaryotes 


none 


Mctazoa 


Pallidum and Midbrain 


Coelomata 


None 


Chordata 


Hippocampal region, Thalamus, Pons and Medulla 


Vertebrata 


Olfactory areas and Cerebellum 


Sarcopterygii 


Olfactory areas 


Mammalia 


Hypothalamus 


Eutheria 


Cerebellum 


Euarchontoglires 


Pallidum, Hypothalamus, Midbrain, Pons, Medulla 


Rodcntia 


Hypothalamus, Striatum 



Table 1. Regions where dated genes grouped by clade is over-expressed with statistical significance 
(two standard deviation or more). 



by 

where | Gt | is the number of genes in set Gt ■ The over-expression of genes belonging to Gt relative to 
the average level of expression in v can be captured (in logarithmic scale) by the quantity Lt(v) defined 
as 

M«) = Log (p^r) (3) 
\E tot {v)J 

1 |G| 

Etot(v) = —X^Efag), (4) 

3=1 

where Nq is the total number of genes. 

We show the Maximal-Intensity Projections of Lt in Fig. [2] summarizing our main results. The 
patterning of Lt across the brain regions inform us on the signature left by molecular evolution on 
neuroanatomical regions. Our key observations on the significant over-expression as judged by Lt is 
summarized in tabular form in Table [TJ which we discuss in detail in Discussion section. 

What are the attributes of the genes that contribute significantly to the over-expression in a specific 
brain region at a specific timescale? The list of dated genes significantly over-expressed in a specific brain 
region can be large, and we need to rank their importance. This is done by computing the localization 
score X(g, R) of genes g in brain region R, see Ref. [24] for further details. The definition of X(g, R) is as 
follows — 

where SI denotes the whole brain. The score A(<?, R) is a positive number lesser than one — genes with 
expression contained within a specific brain region enjoy a high localization score. For the regions in Ta- 
bic [1] corresponding to the clades, we report the attributes of the dated genes with high-localization scores 
X(g,R) in those regions. These attributes are obtained from Mouse Genome Informatics database |25j . 
The functional annotation/names of genes is reported in Table [21 and is primarily meant as a practical 
summary of an otherwise long list of diverse genes. A comprehensive list can be found in Supplementary 
Material and see further discussion of Table [T] and [5] in Discussion section. 
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Appearance of clade 
(number of species) 



Eukaryota (38) 



Metazoa (8) 



Coelomata (13) 



Chordata (3) 



Vertebrata (5) 



Sarcopterygii (5) 



Mammalia (3) 



Eutheria (16) 



Euarchont oglir es (12) 



Rodentia (5) 



Mouse (1) 



Nb of 

genes 



159 



52 



443 



328 



70 



35 



47 



388 
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Figure 2. Heat maps of the log-ratios of genes expression of age-selected genes, Lt, to the average 
across the AGEA. Profile of 'Fine' atlas regions (left hemisphere) where Lt is either 2 standard 
deviations higher than the (bootstrap) mean, or if no such region found, the most significant one is 
shown. 
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Genes novel in clade 


Gene attributes of high localization-score genes in regions of Table Q] 


Eukaryotcs 


none 


Metazoa 


chloride channel, GABA transporter, chcmokincs, sodium/calcium exchange, 
potassium channel . . . 


Coelomata 


None 


Chordata 


Retenoic acid receptor, contactins, growth differentiation factor, calcium channel, 
potassium channel, otic morphogenesis, glial fibrillary acidic protein, anion exchanger, 
pcripherin, actin cytoskeletal related proteins, frizzlcd-rclatcd proteins . . . 


Vertebrata 


Synaptotagmin, versican, calcium channel voltage-dependent subunits, 

frizzlcd-related proteins, corticotropin releasing hormone, 

receptor (calcitonin) activity modifier, cadherin-related, semaphorins, 

Adenomatous polyposis coli (APC), olfactory receptors, glutamate receptor, 

cphrin receptor, GABA receptor subunit, neurogenic differentiation, opsins, 

otic morphogenesis, cdothelin receptors, canabinoid receptor, Cholinergic receptor, 

visual system homeobox . . . 


Sarcopterygii 


RAS activator, neuron differentiation, thyrotropin releasing hormone receptor, 
semaphorins, complexin, potassium voltage-gated channel . . . 


Mammalia 


pregnancy-upregulated kinase, cytokine receptor, neurexophilin . . . 


Eutheria 


Purkinje cell protein, cardiotrophin, titin . . . 


Euarchontoglires 


insulin receptor substrate, GABA receptor subunit, 

thyrotropin releasing hormone, huntingtin-associated protein, neuronatin, necdin, 
nerve-growth factor receptor, calcitonin-related, growth-hormone-releasing, 
chloride channel . . . 


Rodcntia 


metal ion transporter, trophinin . . . 



Table 2. Summary of annotation of genes with high-localization scores in regions listed in Table [T] 



Materials and Methods 

Determining the age of genes 

Algorithms and biologically relevant criteria for determining gene orthologs through sequence align- 
ment have been discussed extensively in the literature [261 - I33] . Gene fusion/fission events leads to non- 
orthologous genes sharing similar protein domains, therefore, the query sequence is usually mapped to 
the longest transcript (protein sequence) and a minimum-length-criterion on the sequence alignment is 
imposed for orthology. A tool like protein-BLAST is usually used to search for similar sequences across 
the whole genome. However, sequence similarity is not sufficient in establishing orthology and various 
algorithms and databases have been developed for reliable genome- wide orthology [33H381. For example, 
reciprocal gene loss events in two species may lead to the lone surviving paralogs in each species to be 
detected as orthologs. and therefore (an amino-acid substitution rate based) distance criteria is necessary 
to filter sequence similarity results . Though earlier work on neuronal genes [TS] have depended 

solely on BLAST tools, we use the OMA dataset [391441] comprising a comprehensive list of all orthologs 
found for all fully-sequenced species. Our main reasons for choosing this dataset are 

• The computational rigor employed in determining sequence similarity and declaring orthology. The 
sequence alignment tool is Smith- Waterman, a relaxed Reciprocal Smallest Distance criterion 
using the evolutionary distance between sequences is employed to determining candidate orthologs, 
and triangle equalities for pair-wise distances between such candidates of the two query species 
as well as a third 'witness' species (that has older common ancestor) is used to filter potential 
orthologs. 
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An 



up-to-date analysis of all pairwisc orthologs of all fully-sequenced cukaryotcs is available. 



To determine the 'age' of a gene we proceed as follows. We use the order of clades introduced in the 
Results section. Denote this ordering by St', the set of species that belong in clade T. Consider the species 
in the set difference, At — St ~ St+i- Orthologs of a gene g in this set At bear witness to whether g 
was present in the common ancestor of St- Naively, one expects to observe a sudden appearance of g 
in At for some T, and its ubiquitous presence subsequently. However, gene loss events may lead to the 
disappearance of a gene from an entire branch of the phylogenetic tree. Moreover, because we can only 
sample a fraction of all speciation events in any evolutionary history, gene orthologs may appear in only 
a few of all the species in At; the ones that share a common ancestor with St+i- Thus, the frequency 
of finding orthologs in St show signature of both invasion of a gene and its loss on branches of the tree. 
We fit the frequency profile to a phenomenological function with three parameters, 



/(*) 



Exp [- M ( 3 -T)] 
1 + Exp [-{s-T)/b}'' 



(6) 



where T is the 'age', /i corresponds to the loss rate, b the rate of invasion, and s indexes the ordering of 
clades. Further discussion are relegated to Supplementary Material. As an example, the average profile 
of a few random genes novel in the fourth clade is shown is plotted in Fig. [3] 
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Figure 3. Average profile of genes novel in fourth clade 
For our analysis, we round the fitting parameter T to obtain the discrete 'age' score for each gene. 



Analysis of gene expression 

In our results section we report Lt(v) of genes grouped by 'age' T. Here we present analysis of the 
statistical significance of the patterns observed in Lt{v) using a bootstrap strategy. 

Denote the size of the set Gt by \Gt\- Wc consider the log-relative gene expression energy Lt{R) for 
each of the 'Bigl2' brain parcellations by R € [1, 12] where the regions are {Cerebral Cortex, Olfactory 
areas, Hippocampal region, Retrohippocampal region, Stratum, Pallidum, Thalamus, Hypothalamus, 
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Midbrain, Pons, Meduall, Cerebellum }. We repeatedly draw (ten thousand draws) a random set <S ra nd of 
\Gt\ number of genes from the total number of genes G (—=3041) and compute Lt{R) given by equations 
equivalent to Eqs.|3] — 



L T (R) = Log(^) 



(7) 



Et(R) = tt^ E (8) 



|G| 



LEE £ (^) ( 9 ) 



\G, 

1 1 g=l veR 



Comparison of the fluctuations in regions specific log- relative gene expression energies Ljt nd (R) computed 
for random draws against the fluctuation in Lt{R) computed for the 'age'-specific genes Gt for each T, 
informs us on the signal to noise ratio and the significance of the patterns we observe. A sample plot 
of L T (R) and Lf nA (R) against R for the genes novel in the Euarchontoglires is shown Fig. @] (also see 
Supplementary Material). For this example, the pattern observed in R = 9, 10, 11 i.e., Midbrain, Pons 
and Medulla are at least three standard deviation stronger than expected by chance. Moreover, some of 
he striking neuroanatomical patterns on the heat maps in Fig. [3] are confirmed at the level of two standard 
deviations, for instance the high expression across cerebellum in genes novel to Euthcria, and the high 
expression in thalamus, hypothalamus and midbrain in genes novel to Euarchontologlircs. We note that 
the deviations from the mean can be larger in units of standard deviations when we performed the same 
analysis using the 'Fine' atlas of the left hemisphere that consists of 94 (non-hierarchical) regions. The 
analogous maps of Lj- as a function of the fine-anatomy labels loses clarity in display, therefore we report 
the results as a heat map of statistical significance. The step taken in the process are as follows: 

• Compute L T (R) and Lf nd (R) for all R and all T. 

• For each 'age' T and region R, compute 

L T {R) - mean(Lf nd (R)) 
std(L™ d (i?)) 

where the mean and standard deviation are over the samples created by bootstrapping 



• For each age T, determine region R having values of S(T,R) > <5 C rit; where we display results for 
S crlt = 2. These regions are the i5 crlt -best regions. 

• For each age T, we have a set (5 crlt -best regions. Construct heat map -ffr(^ crlt ) according to the 
rule that if a voxel belongs to one of the (5 crlt -best regions color the voxel proportional to S(T, R), 
otherwise color it black. For T where no 5 crlt -best region is found report the region R with the 
highest 5{T,R). 

Maximal- intensity projections of these heat maps £fr(<5 crlt ) employing the above rules were reported in 
Fig. [2] In Fig. [5] we summarize the results of bootstrap analysis for the 'Bigl2' regions where a heatmap 
of 6(T,R) is shown for each clade and for each region. The heatmap quantify the significance one can 
attach to observed patterns in Fig. [2j 



Discussion 



We discuss the implication of the observations summarized in Table Q] and O The current work serves 
as a resource determining classification of AGEA genes by evolutionary ages, determining statistical 
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Figure 4. Example of bootstrap analysis to estimate significance of observed log-ratio gene expression 
pattern of dated genes. The blue line and the blue band are the mean and one standard deviation off 
mean respectively of L™ d , the expression profile of randomly selected identical number of genes as 
\Gt\- The red line is observed Lt for T corresponding to Euarchontoglires. The abbreviation in the 
x-axis: BasicCOR = Cerebral cortex, OLF = Olfactory areas, HIP = Hippocampal region, RHP = 
Retrohippocampal region, STR = Striatum, PAL = Pallidum, HYP = Hypothalamus, THA = 
Thalamus, MID = Midbrain, PON = Pons, MED = Medulla, CER = Cerebellum. 



significance of over-expression in brain regions and rank of localization in those regions. The resource 
can be used for specific inquiry on genes of interest from a functional standpoint. Instead, we discuss 
broader implications of our findings. 

The cerebellum is over-represented for genes dated to Vertebrata. This is consistent with the fact that 
hagfishcs, which are rather elementary vertebrates (lacking a vertebral column but possessing a skull) 
lack cerebellum. Moreover, jawless fishes like lampreys, which are ancient vertebrates, have very small 
cerebellum compared to cartilaginous fishes, which arc more recent vertebrates [2]. With the exception of 
mormyrids (weakly electric fishes), which have unusually elaborate and sophisticated cerebellum, all fishes 
have small cerebellum. Ancient supra- mammals (ancestral amniotes) show an expansion of cerebellum [2] . 
In accordance, we observe Cerebellum over-represented again in Eutheria. Curiously, one of the genes 
responsible for the latter is a Purkinje cell protein. Purkinje cells are found in all jawed vertebrates, 
however, their morphology is remarkably similar in non-amphibian tctrapods [2, 42 . It is not entirely 
clear what the functional role of the genes are that cause over-expression of cerebellum in Eutheria. 

The olfactory areas are over-represented for genes dated to Vertebrates. Studies in evolution of 
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Figure 5. Hcatmap summarizing significance of observed pattern in Lt, determined by bootstrap 
analysis. The abbreviation in the y-axis: COR = Cerebral cortex, OLF = Olfactory areas, HIP = 
Hippocampal region, RHP = Retrohippocampal region, STR = Striatum, PAL = Pallidum, HYP = 
Hypothalamus, THA = Thalamus, MID = Midbrain, PON = Pons, MED = Medulla, CER = 
Cerebellum. The abbreviations is the x-axis are for the clade ordering: EUK = Eukaryota, MET = 
Metazoa, COE = Coelomata, CHO = Chordata, VER = Vertebrata, SAR = Sarcopterygii, MAM = 
Mammalia, EUT = Eutheria, EUA = Euarchontoglires, ROD = Rodentia, MOU = Mouse. 



olfaction has focused on the Olfacory Receptor (OR) genes [43H45] (these are not genes expressed in the 
brain however). Such studies reveal that olfactory system can be traced back to all chordates, Tetrapods 
show a dramatic expansion of such genes and the olfactory areas is fishes are underdeveloped, arguing for 
terrestrial adaption imposing severe selective pressure on olfaction j46[l47j. Interestingly, Sarcopterygii 
(lobe-finned fishes) which include Tetrapods as a major clade, show over-expression in Olfactory areas in 
our analysis. 

Genes related to ion channels like chloride, sodium/calcium exchange channels etc. are responsible 
for over-representation in Metazoa. The highly-conserved nature of key ion channels across Metazoa has 
been discussed in the literature [48l|49]. It serves as a test of our analysis that cytoskeletal, signal trans- 
duction, ion-channel proteins and amino-acid transporter [50] are the only ones that are over-represented 
in Metazoa. 

The hypothalamus is not over-represented until genes dated to Mammalia. This is rather curious 
because hypothalamus is present across vertebrates. The mammalian hypothalamus is highly devel- 
oped compared to other amniotes, controlling temperature regulation, social and parental behavior, 
fluid balance, milk flow, reproductory functions, pacemaker for biological rhythm etc. Though sev- 
eral of these behavior are well-developed in birds and reptiles, the mammalian genes responsible for 
over-representation are pregnancy-related and implicated in circadian-control of mammalian locomotion 
(cardiotrophin-cytokine (CLC) related genes) [ST]. The hypothalamus is over-represented in Euarchon- 
toglires and Rodentia, alluding to its further sophistication and specialization in primates and rodents. 

The midbrain region is represented as early as in Metazoa and reappears in Euarchontoglires. The 
optic tectum, an important part of midbrain, is one of the most conserved structure in the brains of 
vertebrates [JJ|2]. However, our results seem to imply that major evolutionary changes has occurred 
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in the Euarchontoglircs cladc. The thalamus is also present in all vertebrates. We see it appear in 
Chordatcs, perhaps owing to the differential elaboration of sensory and motor functions in early chor- 
dates/craniates [IS]. Note that glial fibrillary acidic protein (cytoskeletal protein of astroglia) is one 
of the genes responsible for over-expression of regions in chordates — glial cells are present in almost all 
vertebrates [2]. 

The dorsal stiatopallidal complex has been identified in most branches of vertebrates where its 
GABAcrgic population has been reported [2], major changes have occurred in amniotcs. In birds and 
mammals new pathways have evolved concerning the regulation and initiation of voluntary movements, 
selected by the challenges of highly complex terrestrial environment. The evolution of the striatopallidal 
complex is less understood in an-amniotes. In our study, the Pallidum is over-represented in Euar- 
chotoglires, striatum in Rodcntia and Mouse-specific genes, suggesting very high degree of specialization 
for higher mammals. The genes responsible for such over-representation arc GABA-rcceptor related and 
pituitary-related. The weak representation for cortical regions is striking for all clades, though it is weakly 
represented in Rodcntia in the 'fine' parcellation, and the result is open to interpretation. 

In the current work, we have attempted to draw conclusions on evolution of the mouse brain using 
molecular evolution of gene expressed, drawing statistical significance using a large set of genes. A quesi- 
ton may remain as to why the molecular evolution of a set of brain-specific genes should leaves a signature 
on the evolution of ncuroanatomical regions. In a complimentary work by two of the authors in this study, 
it was observed that the correlations in the the gene-expression profile in AGEA can be reproduced with 
high confidence by assimilating two distinct datasets — the known distribution of several neuronal types 
and the gene expression profiles of each of these neuronal type [23] . The evolution of neuronal types is 
perhaps a connection between the specificity of neuroanatomical regions in the over-expression we re- 
port, though that is not the only interpretation possible. We have offered both genetic and traditional 
comparative neuroanatomy arguments to connect our observations to existing knowledge, however, the 
information provided by our approach is quantitative and complementary. We hope to further our studies 
in comparative brain evolution using gene-expression (as opposed to anatomical homology) of multiple 
species, as AGEA-like atlases for other species become available. 
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